{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "846ef4f4-ee38-4a42-a7d3-1a23826e4830",
      "metadata": {},
      "source": [
        "# How to map values to a database\n",
        "\n",
        "In this guide we'll go over strategies to improve graph database query generation by mapping values from user inputs to database.\n",
        "When using the built-in graph chains, the LLM is aware of the graph schema, but has no information about the values of properties stored in the database.\n",
        "Therefore, we can introduce a new step in graph database QA system to accurately map values."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "26677b08",
      "metadata": {},
      "source": [
        "## Setup\n",
        "#### Install dependencies\n",
        "\n",
        "```{=mdx}\n",
        "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n",
        "import Npm2Yarn from \"@theme/Npm2Yarn\";\n",
        "\n",
        "<IntegrationInstallTooltip></IntegrationInstallTooltip>\n",
        "\n",
        "<Npm2Yarn>\n",
        "  langchain @langchain/community @langchain/openai @langchain/core neo4j-driver zod\n",
        "</Npm2Yarn>\n",
        "```\n",
        "\n",
        "#### Set environment variables\n",
        "\n",
        "We'll use OpenAI in this example:\n",
        "\n",
        "```env\n",
        "OPENAI_API_KEY=your-api-key\n",
        "\n",
        "# Optional, use LangSmith for best-in-class observability\n",
        "LANGSMITH_API_KEY=your-api-key\n",
        "LANGSMITH_TRACING=true\n",
        "\n",
        "# Reduce tracing latency if you are not in a serverless environment\n",
        "# LANGCHAIN_CALLBACKS_BACKGROUND=true\n",
        "```\n",
        "\n",
        "Next, we need to define Neo4j credentials.\n",
        "Follow [these installation steps](https://neo4j.com/docs/operations-manual/current/installation/) to set up a Neo4j database.\n",
        "\n",
        "```env\n",
        "NEO4J_URI=\"bolt://localhost:7687\"\n",
        "NEO4J_USERNAME=\"neo4j\"\n",
        "NEO4J_PASSWORD=\"password\"\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "50fa4510-29b7-49b6-8496-5e86f694e81f",
      "metadata": {},
      "source": [
        "The below example will create a connection with a Neo4j database and will populate it with example data about movies and their actors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "id": "4ee9ef7a-eef9-4289-b9fd-8fbc31041688",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Schema refreshed successfully.\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "[]"
            ]
          },
          "execution_count": 1,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "import \"neo4j-driver\";\n",
        "import { Neo4jGraph } from \"@langchain/community/graphs/neo4j_graph\";\n",
        "\n",
        "const url = process.env.NEO4J_URI;\n",
        "const username = process.env.NEO4J_USER;\n",
        "const password = process.env.NEO4J_PASSWORD;\n",
        "const graph = await Neo4jGraph.initialize({ url, username, password });\n",
        "\n",
        "// Import movie information\n",
        "const moviesQuery = `LOAD CSV WITH HEADERS FROM \n",
        "'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'\n",
        "AS row\n",
        "MERGE (m:Movie {id:row.movieId})\n",
        "SET m.released = date(row.released),\n",
        "    m.title = row.title,\n",
        "    m.imdbRating = toFloat(row.imdbRating)\n",
        "FOREACH (director in split(row.director, '|') | \n",
        "    MERGE (p:Person {name:trim(director)})\n",
        "    MERGE (p)-[:DIRECTED]->(m))\n",
        "FOREACH (actor in split(row.actors, '|') | \n",
        "    MERGE (p:Person {name:trim(actor)})\n",
        "    MERGE (p)-[:ACTED_IN]->(m))\n",
        "FOREACH (genre in split(row.genres, '|') | \n",
        "    MERGE (g:Genre {name:trim(genre)})\n",
        "    MERGE (m)-[:IN_GENRE]->(g))`\n",
        "\n",
        "await graph.query(moviesQuery);"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "0cb0ea30-ca55-4f35-aad6-beb57453de66",
      "metadata": {},
      "source": [
        "## Detecting entities in the user input\n",
        "We have to extract the types of entities/values we want to map to a graph database. In this example, we are dealing with a movie graph, so we can map movies and people to the database."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "id": "e1a19424-6046-40c2-81d1-f3b88193a293",
      "metadata": {},
      "outputs": [],
      "source": [
        "import { ChatPromptTemplate } from \"@langchain/core/prompts\";\n",
        "import { ChatOpenAI } from \"@langchain/openai\";\n",
        "import { z } from \"zod\";\n",
        "\n",
        "const llm = new ChatOpenAI({ model: \"gpt-3.5-turbo\", temperature: 0 })\n",
        "\n",
        "const entitySchema = z.object({\n",
        "    names: z.array(z.string()).describe(\"All the person or movies appearing in the text\"),\n",
        "}).describe(\"Identifying information about entities.\");\n",
        "\n",
        "\n",
        "const prompt = ChatPromptTemplate.fromMessages(\n",
        "  [\n",
        "    [\n",
        "      \"system\",\n",
        "      \"You are extracting person and movies from the text.\"\n",
        "    ],\n",
        "    [\n",
        "      \"human\",\n",
        "      \"Use the given format to extract information from the following\\ninput: {question}\"\n",
        "    ]\n",
        "  ]\n",
        ");\n",
        "\n",
        "const entityChain = prompt.pipe(llm.withStructuredOutput(entitySchema));"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9c14084c-37a7-4a9c-a026-74e12961c781",
      "metadata": {},
      "source": [
        "We can test the entity extraction chain."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "id": "bbfe0d8f-982e-46e6-88fb-8a4f0d850b07",
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{ names: [ \u001b[32m\"Casino\"\u001b[39m ] }"
            ]
          },
          "execution_count": 16,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "const entities = await entityChain.invoke({ question: \"Who played in Casino movie?\" })\n",
        "entities"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "a8afbf13-05d0-4383-8050-f88b8c2f6fab",
      "metadata": {},
      "source": [
        "We will utilize a simple `CONTAINS` clause to match entities to database. In practice, you might want to use a fuzzy search or a fulltext index to allow for minor misspellings."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "id": "6f92929f-74fb-4db2-b7e1-eb1e9d386a67",
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "\u001b[32m\"Casino maps to Casino Movie in database\\n\"\u001b[39m"
            ]
          },
          "execution_count": 18,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "const matchQuery = `\n",
        "MATCH (p:Person|Movie)\n",
        "WHERE p.name CONTAINS $value OR p.title CONTAINS $value\n",
        "RETURN coalesce(p.name, p.title) AS result, labels(p)[0] AS type\n",
        "LIMIT 1`\n",
        "\n",
        "const matchToDatabase = async (values) => {\n",
        "    let result = \"\"\n",
        "    for (const entity of values.names) {\n",
        "        const response = await graph.query(matchQuery, {\n",
        "            value: entity\n",
        "        })\n",
        "        if (response.length > 0) {\n",
        "            result += `${entity} maps to ${response[0][\"result\"]} ${response[0][\"type\"]} in database\\n`\n",
        "        }\n",
        "    }\n",
        "    return result\n",
        "}\n",
        "\n",
        "await matchToDatabase(entities)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "f66c6756-6efb-4b1e-9b5d-87ed914a5212",
      "metadata": {},
      "source": [
        "## Custom Cypher generating chain\n",
        "\n",
        "We need to define a custom Cypher prompt that takes the entity mapping information along with the schema and the user question to construct a Cypher statement.\n",
        "We will be using the LangChain expression language to accomplish that."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "id": "8ef3e21d-f1c2-45e2-9511-4920d1cf6e7e",
      "metadata": {},
      "outputs": [],
      "source": [
        "import { StringOutputParser } from \"@langchain/core/output_parsers\";\n",
        "import { RunnablePassthrough, RunnableSequence } from \"@langchain/core/runnables\";\n",
        "\n",
        "// Generate Cypher statement based on natural language input\n",
        "const cypherTemplate = `Based on the Neo4j graph schema below, write a Cypher query that would answer the user's question:\n",
        "{schema}\n",
        "Entities in the question map to the following database values:\n",
        "{entities_list}\n",
        "Question: {question}\n",
        "Cypher query:`\n",
        "\n",
        "const cypherPrompt = ChatPromptTemplate.fromMessages(\n",
        "    [\n",
        "        [\n",
        "            \"system\",\n",
        "            \"Given an input question, convert it to a Cypher query. No pre-amble.\",\n",
        "        ],\n",
        "        [\"human\", cypherTemplate]\n",
        "    ]\n",
        ")\n",
        "\n",
        "const llmWithStop = llm.bind({ stop: [\"\\nCypherResult:\"] })\n",
        "\n",
        "const cypherResponse = RunnableSequence.from([\n",
        "    RunnablePassthrough.assign({ names: entityChain }),\n",
        "    RunnablePassthrough.assign({\n",
        "        entities_list: async (x) => matchToDatabase(x.names),\n",
        "        schema: async (_) => graph.getSchema(),\n",
        "    }),\n",
        "    cypherPrompt,\n",
        "    llmWithStop,\n",
        "    new StringOutputParser(),\n",
        "])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "id": "1f0011e3-9660-4975-af2a-486b1bc3b954",
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "\u001b[32m'MATCH (:Movie {title: \"Casino\"})<-[:ACTED_IN]-(actor)\\nRETURN actor.name'\u001b[39m"
            ]
          },
          "execution_count": 20,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "const cypher = await cypherResponse.invoke({\"question\": \"Who played in Casino movie?\"})\n",
        "cypher"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Deno",
      "language": "typescript",
      "name": "deno"
    },
    "language_info": {
      "file_extension": ".ts",
      "mimetype": "text/x.typescript",
      "name": "typescript",
      "nb_converter": "script",
      "pygments_lexer": "typescript",
      "version": "5.3.3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}
